Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models

نویسندگان

Nan Jiang

Wenge Rong

Min Gao

Yikang Shen

Zhang Xiong

چکیده

Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to neural language modeling and neural machine translation. These neural models can leverage knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference. As an alternative to gradient approximation and softmax with class decomposition, we explore the tree-based hierarchical softmax method and reform its architecture, making it compatible with modern GPUs and introducing a compact treebased loss function. When combined with several word hierarchical clustering algorithms, improved performance is achieved in language modelling task with intrinsic evaluation criterions on PTB, WikiText-2 and WikiText-103 datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incrementally Learning the Hierarchical Softmax Function for Neural Language Models

Neural network language models (NNLMs) have attracted a lot of attention recently. In this paper, we present a training method that can incrementally train the hierarchical softmax function for NNMLs. We split the cost function to model old and update corpora separately, and factorize the objective function for the hierarchical softmax. Then we provide a new stochastic gradient based method to ...

متن کامل

Self-organized Hierarchical Softmax

We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and s...

متن کامل

Hierarchical Multi-scale Attention Networks for action recognition

Recurrent Neural Networks (RNNs) have been widely used in natural language processing and computer vision. Among them, the Hierarchical Multi-scale RNN (HM-RNN), a kind of multi-scale hierarchical RNN proposed recently, can learn the hierarchical temporal structure from data automatically. In this paper, we extend the work to solve the computer vision task of action recognition. However, in seq...

متن کامل

Strategies for Training Large Vocabulary Neural Language Models

Training neural network language models over large vocabularies is computationally costly compared to count-based models such as Kneser-Ney. We present a systematic comparison of neural strategies to represent and train large vocabularies, including softmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization. We extend self normalization to be a proper es...

متن کامل

Leaf-Smoothed Hierarchical Softmax for Ordinal Prediction

We propose a new approach to conditional probability estimation for ordinal labels. First, we present a specialized hierarchical softmax variant inspired by k-d trees that leverages the inherent spatial structure of (potentially-multivariate) ordinal labels. We then adapt ideas from signal processing on noisy graphs to develop a novel regularizer for such hierarchical softmax models. Both our t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models

نویسندگان

چکیده

منابع مشابه

Incrementally Learning the Hierarchical Softmax Function for Neural Language Models

Self-organized Hierarchical Softmax

Hierarchical Multi-scale Attention Networks for action recognition

Strategies for Training Large Vocabulary Neural Language Models

Leaf-Smoothed Hierarchical Softmax for Ordinal Prediction

عنوان ژورنال:

اشتراک گذاری